home *** CD-ROM | disk | FTP | other *** search
-
-
-
- 191
-
- CHAPTER 19 - STRINGS
-
-
- Sometimes we want to deal with long strings of information. Here
- long means hundreds or thousands of bytes, not tens of bytes. The
- 8086 provides a group of instructions to move and compare
- strings. These instructions have a rigid structure, but with a
- little bit of effort we can get them to work easily for us. We
- will start with SCAS, since it is simple, yet embodies all the
- rigid features of these instructions.
-
- SCAS (scan string) compares either a byte to AL or a word to AX.
- The byte or word must be in memory, and the register must be AL
- or AX. SCAS also increments or decrements the pointer. First, the
- size:
-
- scasb
-
- compares a byte to AL, while:
-
- scasw
-
- compares a word to AX. But where's the pointer? You have no
- choice, it's DI. Not only is it DI, but it MUST be ES:DI. The ES
- segment is coded into the 8086 microcode; the DI register is
- coded into the 8086 microcode; there is nothing you can do to
- change it. What about incrementing or decrementing? In the flags
- register, there is a flag called the direction flag. It is set
- manually by the program. If DF = 0, SCAS increments DI; if DF =
- 1, SCAS decrements DI.{1} The equivalent software for the
- instruction would be:
-
- (scasb) (scasw)
- DF = 0 cmp al, es:[di] cmp ax, es:[di]
- pushf pushf
- add di, 1 add di, 2
- popf {2} popf
-
-
- DF = 1 cmp al, es:[di] cmp ax, es:[di]
- pushf pushf
- sub di, 1 sub di, 2
- popf popf
-
- ____________________
-
- 1. Every time you have called show_regs DF has been there; it
- doesn't show 0 and 1, it shows + and - (+ = 0 , - = 1).
-
- 2. The microcode doesn't really push and pop the flags. This
- is only to indicate that the order of operations is (1) get the
- byte (word) from the string, (2) compare and set the flags, and
- finally (3) increment (decrement) the pointer without changing
- any of the flags.
-
- ______________________
-
- The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson
-
-
-
-
- The PC Assembler Tutor 192
- ______________________
-
- Thus, at the end of the end of the instruction, DI is in a new
- place and the flags are set according to the compare result. DI
- is incremented by a byte for the byte instructions; it is
- incremented by a word for the word instructions. The same pattern
- holds true for decrementing DI.
-
- We set the direction flag with the instruction STD (set direction
- flag) and we clear the direction flag by using CLD (clear
- direction flag). It only needs to be set or cleared once, and
- this should be done before starting the operation. DF is only
- changed by those specific instructions from the program - it
- can't be changed by any arithmetical or logical operation on the
- chip.
-
- If you have a string and you are looking for a specific number,
- (27 for instance), you simply put that number in AL (or AX) and
- run a loop. If:
-
- long_string db 5000 dup (?)
-
- contains data and we want to look for a 27d then the operation
- is:
-
- lea di, long_string{3}
- mov al, 27
- cld
-
- search_loop:
- scasb
- jne search_loop
-
- on exiting, DI will point 1 PAST the matching byte (word). You
- move back one byte (word) to find the match. Why would anyone
- want this instruction? With a 0 in AL, it will find the end of a
- C (0d terminated) string quickly. Also, that number 27 is no
- accident. 27d is the ASCII escape character. For a lot of
- hardware, 27d indicates that the bytes that follow are not ASCII
- characters but are technical information. For instance, on my
- printer the sequence (27d, 65d, 0d, 11d) sets tabs every 11
- columns. SCAS can find where these substrings are so the program
- can operate on them.
-
- In order to use string instructions, we need strings to work on.
- The one we will use is called CH1STR.OBJ. It is in \XTRAFILE. It
- is an object file that contains one data segment containing a
- string of lower case characters. The string is several thousand
- bytes long, it is terminated by 0, and it contains ONLY the lower
- case letters (a-z). It is the first draft of part of chapter 0
- with all punctuation, numbers, spaces, carriage returns etc.
- deleted. All upper case letters have been converted to lower case
- so we don't have to worry about the difference between A and a, Q
- and q.
-
- The name of the array in CH1STR.OBJ is CH1STR and it is defined:
- ____________________
-
- 3. Assuming that long_string's segment address is in ES.
-
-
-
-
- Chapter 19 - Strings 193
- ____________________
-
-
- PUBLIC ch1str
-
- so that you can access it with the SEG and OFFSET operators.
- First, let's find out how long the string is.{4}
-
- MYPROG1.ASM
- ; - - - - - - - - - -
- STRINGSTUFF SEGMENT PUBLIC 'DATA'
- EXTRN ch1str:BYTE
- STRINGSTUFF ENDS
- ; - - - - - - - - - -
- ;- - - - - - - - - - PUT CODE BELOW THIS LINE
-
- mov ax, seg ch1str ; segment address of ch1str
- mov es, ax
-
- mov di, offset ch1str ; offset address of ch1str
- mov al, 0 ; try to match zero
- cld ; increment (DF = 0)
-
- string_end_loop:
- scasb
- jne string_end_loop
-
- dec di ; back up one
- mov ax, di
- sub ax, offset ch1str
- call print_unsigned
-
- ;- - - - - - - - - - PUT CODE ABOVE THIS LINE
-
- In all these string operations, we need to be careful about
- boundary conditions. What if there is no valid data? What if
- there is one valid item? What if the string is empty?
-
- On exiting the loop, DI will point 1 past the first 0d, so we
- need to back up one to point to the first 0d. Then subtracting
- the starting position will give us the count.{5} Try it out and
- find out how long it is. Since we now have 3 object modules, the
- link instruction must read:
-
- link myprog+ch1str+asmhelp ;
-
- assuming that you name your program myprog.asm. Save the result
- because we will need to use this number several times.
-
- ____________________
-
- 4. Just to keep it from being too easy, I have put garbage
- both in front of the string and behind the string. That means
- that the string length is shorter than the length of the object
- file and the string does not start with the first byte of the
- object file.
-
- 5. If the first byte in the string is 0d, we move one, then
- move back one which gives the length zero.
-
-
-
-
- The PC Assembler Tutor 194
- ______________________
-
- You will notice that we have gotten the segment address by using
- the SEG operator. You don't need to know the name of the segment.
- The segment doesn't even have to be PUBLIC. As long as the
- VARIABLE is either in the same file or is in another file and
- PUBLIC, the linker will find the correct segment address and put
- it there.
-
-
- To make things a little more complicated, we will make another
- infinite loop. This time you will enter a character, and the
- program will find the first occurance of that character. We need
- to add some error checking here. Since you will probably be
- dreaming about taking your next vacation in Hawaii while you are
- entering the data, a few characters that don't exist in the
- string (things like G $ ? ~ ) might creep in. It would be
- possible to run way past the end of the string before you found
- that character. We'll put the length of the string (from the last
- program) in CX, have a regular loop so we can't go too far, and
- jump out of the loop if we find a match.
-
- MYPROG2.ASM
- ; - - - - - - - - - -
- STRINGSTUFF SEGMENT PUBLIC 'DATA'
- EXTRN ch1str:BYTE
- STRINGSTUFF ENDS
- ; - - - - - - - - - -
- ;- - - - - - - - - - PUT CODE BELOW THIS LINE
-
- mov ax, seg ch1str
- mov es, ax
-
- outer_loop:
- call get_ascii_byte ; returns character in al
- mov cx, $$$$$$$ ; enter string length here
-
- mov di, offset ch1str
- cld ; increment (DF = 0)
-
- string_end_loop:
- scasb
- je after_loop ; if equal, we found the char
- loop string_end_loop
-
- mov ax, 0 ; we fell through the loop
- call print_unsigned
- jmp outer_loop
-
- after_loop:
- mov ax, di ; move for printing
- sub ax, offset ch1str ; number of bytes
- call print_unsigned
-
- jmp outer_loop
-
-
- ;- - - - - - - - - - PUT CODE ABOVE THIS LINE
-
-
-
-
-
- Chapter 19 - Strings 195
- ____________________
-
- Those dollar signs are the place to enter the exact length of the
- string that you got from the last program. This time we jump out
- of the loop if we find a match; DI will be 1 past the matching
- character, but this will give us the right count (if we find the
- character in the first space, we increment once). If we can't
- find a match we fall through the loop and print a 0. Remember to
- link all 3 modules when you run the program. Run the program and
- then we'll move forward.
-
- This type of thing is so common with string operations that there
- is a special prefix for SCAS and all other string operations
- which makes the coding simpler. It has several forms:
-
- rep decrement cx ; repeat if cx is not zero
- repe decrement cx ; repeat if cx not zero and zf = 1
- repz decrement cx ; repeat if cx not zero and zf = 1
- repne decrement cx ; repeat if cx not zero and zf = 0
- repnz decrement cx ; repeat if cx not zero and zf = 0
-
- REP is for the move instructions which we will see later - it
- won't work here. For each prefix, if either (or both) of the
- conditions is not true, the repitition stops. For instance, with
- REPE, if cx is zero, and/or if the comparison was not equal (so
- the zero flag was not set), the instruction will stop. For our
- program, the coding is:
-
- repne scasb
-
- That's it. That replaces the whole inner loop. Here is our new
- coding of the last program.
-
- MYPROG3.ASM
- ; - - - - - - - - - -
- STRINGSTUFF SEGMENT PUBLIC 'DATA'
- EXTRN ch1str:BYTE
- STRINGSTUFF ENDS
- ; - - - - - - - - - -
- ;- - - - - - - - - - PUT CODE BELOW THIS LINE
-
- mov ax, STRINGSTUFF
- mov es, ax
- cld ; increment (DF = 0)
-
- outer_loop:
- call get_ascii_byte ; returns character in al
- mov cx, $$$$$$$ ; enter string length here
- lea di, ch1str ; address of string
-
- repne scasb
-
- je found_the_char ; an equal comparison
- mov ax, 0 ; we didn't find a match
- call print_unsigned
- jmp outer_loop
-
- found_the_char:
- mov ax, di ; move for printing
-
-
-
-
- The PC Assembler Tutor 196
- ______________________
-
- sub ax, offset ch1str ; number of bytes
- call print_unsigned
- jmp outer_loop
-
- ;- - - - - - - - - - PUT CODE ABOVE THIS LINE
-
- There are two possibilities for exiting the 'repne scasb'
- instruction. Either we found an equal comparison or we exhausted
- all the characters in ch1str. If we found an equal comparison, JE
- will send us to the print routine. Otherwise we print a 0 because
- we finished the loop without finding anything.
-
-
- STOS
-
- We can ask the operating system to allocate memory for us while
- the program is running.{6} When you get it, however, it will
- contain trash. The fast way to clear it is to use STOS (store to
- string). The instruction is:
-
- stosb
- or:
- stosw
-
- The equivalent action (not counting changing the value of DI) is:
-
- mov es:[di], ax ; or AL for byte moves
-
- Once again (1) the pointer is the ES:DI pair, which is mandatory,
- and (2) DI is incremented or decremented (by 1 for byte, by 2 for
- word) depending on the status of DF, the direction flag. The
- instruction moves a byte (a word) from the AL (AX) register to
- the memory address pointed to by ES:DI. We can use the REP{7}
- instruction to speed things up a bit. If we have a 11,872 word
- block of memory, we can clear it with the following instructions:
-
- ; - - - - - - -
- DATASTUFF SEGMENT
- my_bufferdw 11872 dup (?)
- DATASTUFF ENDS
- ; - - - - - - -
-
- mov ax, seg my_buffer
- mov es, ax
- cld ; increment (DF = 0)
-
- mov ax, 0 ; clear the buffer with 0s
- mov di, offset my_buffer
- mov cx, 11872
- rep stosw
- ____________________
-
- 6. Cf. You-know-who's Programmer's Guide to You-know-what or
- "DOS Programmer's Reference."
-
- 7. There is no comparison here, so REPE or REPNE doesn't make
- any sense.
-
-
-
-
- Chapter 19 - Strings 197
- ____________________
-
-
- That's as fast as it gets. Why does the STOS instruction use AX?
- Because that's the register that port i/o uses. If you are
- writing a communications program, you need speed. You can have
- the following:
-
- ; - - - - - - - - -
- DATASTUFF SEGMENT
- port_address dw 0F2A8h ; this address is legal but
- ; there's nothing there.
- input_buffer db 4000h dup (?)
- output_buffer db 4000h dup (?)
- DATASTUFF ENDS
- ; - - - - - - - - -
-
- mov ax, DATASTUFF
- mov es, ax
- cld ; increment (DF = 0)
- mov di, offset input_buffer
- mov dx, port_address
-
- input_loop:
- in al, dx
- stosb
- jmp input_loop
- ; - - - - - - - - -
-
- A real program would be much more complicated because we would
- have to check to see if data was ready to come in and we might
- need to check the data for errors. Also we would occasionally
- have to clear the buffer. The port address F2A8h is just an
- arbitrary address. It's a legal address but there's nothing
- there.
-
- We should write a program, so let's input a character and have it
- fill the screen. We'll leave the last line of the screen alone so
- you can see your input. Move your cursor to the last line before
- beginning the program.
-
- ; - - - - - ENTER CODE BELOW THIS LINE
-
- mov ax, 0B800h ; or 0B000h for a monochrome card
- mov es, ax
- cld ; increment (DF = 0)
-
-
- outer_loop:
- call get_ascii_byte ; AL = fill char from input
- mov ah, 07h ; black background, white letters
- sub di, di ; set di to zero
- mov cx, 1920 ; 24 lines X 80 chars
- rep stosw
- jmp outer_loop
-
- ; - - - - - ENTER CODE ABOVE THIS LINE
-
- If you have a monochrome card, the segment address is 0B000h. If
-
-
-
-
- The PC Assembler Tutor 198
- ______________________
-
- you have a color card and are in text mode, the segment address
- should be 0B800h. This fills the first 24 lines with the input
- character. The STOS instruction has no effect on the cursor.
-
-
- LODS
-
- The opposite of STOS is LODS (load string) It moves a byte (word)
- from the string to the AL (AX) register. This time, for a change,
- we use the SI register as a pointer, and the default register is
- DS.{8} As always, SI is incremented or decremented by a byte
- (word) depending on the setting of DF, the direction flag. The
- two possibilities are:
-
- lodsb
- and
- lodsw
-
- The equivalent action (not counting changing the value of SI) is:
-
- mov ax, [si] ; or AL for byte moves
-
- This is an instruction for people that write device drivers. You
- could use it if you are sending a string of characters to the
- printer, but that's about it. Code for doing that would have the
- following form:
-
- ; - - - - - - -
- buffer db 1000 dup (?)
- ; - - - - - - -
- lea si, buffer ; the buffer must be in the ds segment
- cld ; increment
-
- out_loop:
- lodsb
- and al, al ; if 0, end of string
- jz quit_loop
-
- mov dl, al ; move character to dl {9}
- mov ah, 5 ; int 21h function 5
- int 21h ; print a character
- jmp out_loop
-
- quit_loop:
- ; continue with the program
-
- If you actually run this program, many printers will not print
- anything until they get an end of line signal ( 10d, 13d).
-
-
- ____________________
-
- 8. Register DS can be overriden. We'll talk about that in the
- second part of this chapter.
-
- 9. Int 21h (AH = 5) prints one character from DL to the
- printer. Why it's DL and not AL is a mystery.
-
-